Two Heuristics for Solving POMDPs Having a Delayed Need to Observe

نویسندگان

Valentina Bayer Zubek

Thomas Dietterich

چکیده

A common heuristic for solving Partial ly Observable Markov Decision Problems POMDPs is to rst solve the underlying Markov Decision Process MDP and then con struct a POMDP policy by performing a xed depth lookahead search in the POMDP and evaluating the leaf nodes using the MDP value function A problem with this approximation is that it does not account for the need to choose actions in order to gain information about the state of the world particularly when those ob servation actions are needed at some point in the future This paper proposes two heuristics that are better than the MDP approximation in POMDPs where there is a delayed need to observe The rst approximation introduced in is the even odd POMDP in which the world is assumed to be fully observable every other time step The even odd POMDP can be converted into an equivalent MDP the even MDP whose value function captures some of the sensing costs of the original POMDP An online policy consisting in a step lookahead search com bined with the value function of the even MDP gives an approximation to the POMDP s value function that is at least as good as the method based on the value function of the underlying MDP The second POMDP approximation is applicable to a special kind of POMDP which we call the Cost Observable Markov Decision Problem COMDP In a COMDP the actions are partitioned into those that change the state of the world and those that are pure observa tion actions For such problems we describe the chain MDP algorithm which in many cases is able to capture more of the sensing costs than the even odd POMDP approximation We prove that both heuristics compute value functions that are upper bounded by i e bet ter than the value function of the underlying MDP and in the case of the even MDP also lower bounded by the POMDP s optimal value function We show cases where the chain MDP online policy is better equal or worse than the even MDP online policy

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved teaching–learning-based and JAYA optimization algorithms for solving flexible flow shop scheduling problems

Flexible flow shop (or a hybrid flow shop) scheduling problem is an extension of classical flow shop scheduling problem. In a simple flow shop configuration, a job having ‘g’ operations is performed on ‘g’ operation centres (stages) with each stage having only one machine. If any stage contains more than one machine for providing alternate processing facility, then the problem...

متن کامل

Delayed observation planning in partially observable domains

Traditional models for planning under uncertainty such as Markov Decision Processes (MDPs) or Partially Observable MDPs (POMDPs) assume that the observations about the results of agent actions are instantly available to the agent. In so doing, they are no longer applicable to domains where observations are received with delays caused by temporary unavailability of information (e.g. delayed resp...

متن کامل

Decentralized POMDPs

This chapter presents an overview of the decentralized POMDP (Dec-POMDP) framework. In a Dec-POMDP, a team of agents collaborates to maximize a global reward based on local information only. This means that agents do not observe a Markovian signal during execution and therefore the agents’ individual policies map from histories to actions. Searching for an optimal joint policy is an extremely h...

متن کامل

Q-value Heuristics for Approximate Solutions of Dec-POMDPs

The Dec-POMDP is a model for multi-agent planning under uncertainty that has received increasingly more attention over the recent years. In this work we propose a new heuristic QBG that can be used in various algorithms for Dec-POMDPs and describe differences and similarities with QMDP and QPOMDP. An experimental evaluation shows that, at the price of some computation, QBG gives a consistently ...

متن کامل

Ben - Gurion University of the Negev Department of Computer Science Learning and Solving Partially Observable

Partially Observable Markov Decision Processes (POMDPs) provide a rich representation for agents acting in a stochastic domain under partial observability. POMDPs optimally balance key properties such as the need for information and the sum of collected rewards. However, POMDPs are difficult to use for two reasons; first, it is difficult to obtain the environment dynamics and second, even given...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Two Heuristics for Solving POMDPs Having a Delayed Need to Observe

نویسندگان

چکیده

منابع مشابه

Improved teaching–learning-based and JAYA optimization algorithms for solving flexible flow shop scheduling problems

Delayed observation planning in partially observable domains

Decentralized POMDPs

Q-value Heuristics for Approximate Solutions of Dec-POMDPs

Ben - Gurion University of the Negev Department of Computer Science Learning and Solving Partially Observable

عنوان ژورنال:

اشتراک گذاری